A direct approach to sparse discriminant analysis in ultra-high dimensions
نویسندگان
چکیده
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals.
منابع مشابه
Semiparametric Sparse Discriminant Analysis in Ultra-High Dimensions
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten & Tibshirani 2011, Cai & Liu 2011, Mai et al. 2012, Fan et al. 2012). In this paper, we develop high-dimensional semiparametric sparse discriminant analysis (HD-SeSDA) that generalizes the normal-theory discriminant...
متن کاملEfficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures
We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for le...
متن کاملSparse semiparametric discriminant analysis
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten and Tibshirani, 2011, Cai and Liu, 2011, Mai et al., 2012 and Fan et al., 2012). In this paper, we develop high-dimensional sparse semiparametric discriminant analysis (SSDA) that generalizes the normal-theory discr...
متن کاملPinpointing the classifiers of English language writing ability: A discriminant function analysis approach
The major aim of this paper was to investigate the validity of language and intelligence factors for classifying Iranian English learners` writing performance. Iranian participants of the study took three tests for grammar, breadth, and depth of vocabulary, and two tests for verbal and narrative intelligence. They also produced a corpus of argumentative writ...
متن کاملA Note On the Connection and Equivalence of Three Sparse Linear Discriminant Analysis Methods
In this paper we reveal the connection and equivalence of three sparse linear discriminant analysis methods: the `1-Fisher’s discriminant analysis proposed in Wu et al. (2008), the sparse optimal scoring proposed in Clemmensen et al. (2011) and the direct sparse discriminant analysis proposed in Mai et al. (2012). It is shown that, for any sequence of penalization parameters, the normalized sol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012